Data processing system and method

ABSTRACT

A method of optimizing an application in a system having a plurality of processors, the method comprising: analyzing the application for a first period to obtain a first activity analysis; selecting one of the processors based on the activity analysis for running the application; and binding the application to the selected processor.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of co-pending U.S. patent applicationSer. No. 12/021,971, filed Jan. 29, 2008, which claims priority toIndian Patent Application Serial No. 207/CHE/2007, filed in India onJan. 31, 2007 (now abandoned), the entire contents of which are herebyincorporated by reference as though fully set forth herein.

BACKGROUND TO THE INVENTION

In a data processing system with multiple processors, an operatingsystem will schedule a thread to execute on a processor that becomesfree for executing a thread.

A thread of a program running on the system can be bound to a selectedprocessor. The thread will only be executed by the selected processor.For example, the operating system will only schedule the thread to beexecuted by the selected processor.

Enterprise servers, such as, for example, web servers or databaseservers, often contain multiple processors.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will now be described by way of exampleonly, with reference to the accompanying drawings, in which:

FIG. 1 shows an embodiment of a method of optimizing an application;

FIG. 2 shows an embodiment of a method of optimizing an application inmore detail; and

FIG. 3 shows an example of a data processing system suitable forimplementing embodiments of the invention.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

Embodiments of the invention can be used to automatically optimize anapplication (that is, a thread or a process) on a data processing systemhaving a plurality of processors or cells.

In a system with a plurality of processors (logical and/or physicalprocessors), an application may run more efficiently on certainprocessors. For example, on certain processors, more instructions of anapplication may be executed in a given time period and/or theapplication may complete its tasks more quickly. The efficiency of anapplication when executing on a processor can be affected by a number offactors. For example, in a system with cache-coherent non-uniform memoryaccess (ccNUMA) architecture, the location of an application's data inmemory can affect the efficiency of the thread on the various processorsof the system. In such architectures (for example, cell-basedarchitectures), each processor is associated with its own area ofmemory. An application will communicate faster with memory associatedwith the processor on which it is running (i.e. memory within the cell),than with memory associated with other cells.

Also, in certain systems, a system component, such as a networkinterface card (NIC), may be configured to interrupt a predeterminedprocessor when it receives a data packet, and an application thatprocesses the data packet may run more efficiently on the predeterminedprocessor than on other processors.

Embodiments of the invention recognise that an application will executemore efficiently on a certain processor. The application may be a userapplication, or an application that is part of the operating system on adata processing system, or some other application. An applicationexecuting more efficiently may have a higher throughput and/or mayconclude more quickly. Embodiments of the invention may thereforeanalyze the system and the application to determine a processor thatwould execute the application efficiently, and may then bind theapplication to that processor. For example, embodiments of the inventiondetermine which system components the application interacts with, and/orwhich areas of memory the application interacts with, and binds theapplication to a selected processor accordingly. For example, where anapplication interacts with a particular NIC, or interacts with aparticular NIC more than any other NIC, then embodiments of theinvention may bind the application to a processor that is configured tobe interrupted by the particular NIC. Where an application interactswith a particular area of memory in a cell-based architecture, orinteracts more with a particular area of memory than other areas ofmemory, then embodiments of the invention may bind the application tothe processor associated with the particular area of memory.

Embodiments of the invention may improve a number of types ofapplications. For example, purchasing from an online shopping web siteis handled by a data processing system. The data processing system mayinclude one or more applications that handle transactions for buyingproducts. Embodiments of the invention can be used to improve theperformance of the applications so that, for example, transactions areprocessed and completed more quickly by the data processing system, thedata processing system may be able to handle more transactionssimultaneously, and/or the applications may be executed on a dataprocessing system of reduced capabilities (and therefore reduced cost)with little or no reduction in performance.

FIG. 1 shows a method 100 of optimizing an application according toembodiments of the invention. The method starts at step 102, which isthe information gathering phase. In this phase, information is gatheredon the configuration of the data processing system and the activity ofthe application to be optimized. The next step is step 104, the tuningphase, where the application will be tuned on the data processingsystem. For example, the application will be bound to a processor thatis associated with a system component and/or an area of memory withwhich the application has interacted in the information gathering phase102. The method 100 continues from step 104 to step 106, theverification phase, where the performance of the application before andafter the tuning phase 104 is compared, and one or more changes will beundone if necessary if the performance of the application has degraded.

FIG. 2 shows a method 200 of optimizing an application in more detail.The information gathering phase 102 of the method 100 of FIG. 1comprises a step 202 of gathering system information, followed by a step204 of analyzing the application executing on the data processing systemfor a first period. The application may be an application that wasrunning before the method 200 of optimizing the application started, oran application that was not running before the method started but wasstarted before the step 204 of analyzing the application for a firstperiod.

The step 202 of gathering system information comprises obtaininginformation on which processors and/or cells are present in the dataprocessing system, which system components (such as, for example, NICs)are present in the data processing system, and which processors areinterrupted by the system components.

The step 204 of analyzing the application for a first period comprisesobtaining a first activity analysis of the application and obtaining afirst performance analysis of the application. Obtaining a firstactivity analysis of the application comprises analyzing theapplication's interactions with any system components and, in a ccNUMAor cell-based architecture, analyzing the application's interactionswith areas of memory. The first activity analysis may also include theutilization of each processor in the data processing system by allapplications utilizing the processors. Obtaining a first performanceanalysis of the application comprises analyzing the application'sutilization percentage of the processor on which it is executing overthe first period, and/or obtaining a cycles-per-instruction (CPI) valuefor the application over the second period.

Once the first period is over, the method moves from step 204 to thetuning phase. The tuning phase comprises a step 206 of selecting aprocessor for executing the application, and a step 208 of binding theapplication to the processor selected in step 206.

In step 206, a processor is selected for running the application, basedon the first activity analysis. This may be based on one or more of anumber of factors. The activity analysis may reveal that the applicationprocesses packets from one or more system components. The activityanalysis may additionally or alternatively reveal that the applicationinteracts with one or more areas of memory, where each area of memory isassociated with a particular processor in a cell-based architecture.

In certain embodiments, where the application processes packets from oneor more system components, then the selected processor may be aprocessor that is interrupted by one of the system components. Forexample, if the application processes a large number of packets from asystem, such as a number of packets above a threshold amount, then theselected processor will be the processor that is interrupted by thatsystem component. If the application processes a large number of packetsfrom a number of system components, then the system component providingthe largest number of components may be considered, and the processorthat is interrupted by that component selected.

If a processor is not selected as above, then a processor may beselected based on the application's interaction with memory, where thedata processor system comprises a cell-based architecture. For example,where the application interacts with a memory area associated with aprocessor, then that processor will be selected. Where the applicationinteracts with multiple areas of memory associated with respectiveprocessors, then a processor will be selected that will provide theapplication with the greatest performance enhancement. For example, theapplication may have interacted with the area of memory associated withthe selected processor the greatest number of times during the firstperiod, or the area of memory associated with the selected processor mayhave provided the application with the greatest proportion of memoryused by the application.

In alternative embodiments, a processor may be selected based on memoryusage before interaction with system components.

A processor may also be selected based on the first performance analysisobtained during the first period. For example, a processor may not beselected where the utilisation of the processor by all applicationsexceeded a threshold level.

Once a processor has been selected in step 206, the method 200 advancesto step 208 where the application is bound to the selected processor.This causes the operating system to schedule the application to executeon the selected processor during subsequent execution. As a result, theapplication will only be executed by the selected processor, and theapplication may subsequently be executed more efficiently by theselected processor, and hence by the data processing system. If theselected processor is the processor which was executing the applicationbefore the binding in step 208, then the application may still be boundto that processor, as this may prevent or reduce the chance of theoperating system subsequently scheduling the application on a differentprocessor.

Once the application has been bound to the selected processor in step208, the method 200 advances to the verification phase, comprising astep 210 of analyzing the application executing on the data processingsystem for a second period, a step 212 of comparing the performance ofthe application in step 210 with that in step 204, and a step 214 ofundoing the changes (i.e. the binding of the application to the selectedprocessor) if necessary.

In step 210, the performance of the application is analyzed for a secondperiod to obtain a second performance analysis. Obtaining a secondperformance analysis of the application comprises analyzing theapplication's utilization percentage of the processor on which it isexecuting over the second period, and/or obtaining acycles-per-instruction (CPI) value for the application over the secondperiod. Before the first period begins, embodiments of the invention maywait for a predetermined settling time to allow the selected processorto perform any initializations that occur when the application is firstexecuted on the selected processor, such as population of theprocessor's cache due to memory accesses by the application, as theapplication may run inefficiently during this time.

Once the second performance analysis has been obtained in step 210, themethod 200 advances to step 212 where the first performance analysis,obtained in step 204, is compared with the second performance analysis,to determine if the performance of the system has degraded, for exampleif the application is running less efficiently on the second processorthan its original processor. If this is the case, then, in step 214 ofthe method 200, the binding of the application to the processor selectedin step 206 is removed. The application may then be bound to theprocessor that was executing the application before the start of themethod 200. From step 214, the method 200 ends at step 216.

The effective cycles per instruction (CPI) value of an application canbe calculated over a period of time using the following formula:

${CPI} = \frac{{total}\mspace{14mu} {clock}\mspace{14mu} {cycles}}{{{Instructions}\mspace{14mu} {retired}} - {{NOPs}\mspace{14mu} {retired}}}$

A higher CPI value indicates that a processor requires more clock cyclesfor executing each instruction. Therefore, where all of the relevantprocessors of the data processing system have the same or similar clockfrequency, a higher CPI value indicates less efficient execution of theapplication.

Embodiments of the invention may omit the verification phase if it isassumed that the application will execute more efficiently on theselected processor, and/or the overall data processing system will runmore efficiently.

Embodiments of the invention may optimize a number of applicationsrunning on a data processing system, and not just a single application.For example, embodiments of the invention may carry out the method 200of optimizing an application on each application to be optimized inturn.

Embodiments of the invention may be implemented on a number of operatingsystems. For example, embodiments of the invention can be implemented onthe HP-UX operating system. In this case, certain system calls may beused to gather information about the data processing system and/or anyapplications running on it. For example, a list of processors in thedata processing system and, with cell-based architectures, may beobtained using the pstat( ) and/or mpctl( ) system calls, information onsystem components (such as NICs) can be obtained using the dlpi( ) andioctl( ) system calls, and the mpctl( ) system call can be used to bindapplications to certain processors. Other operating systems may providesimilar system calls or other facilities to obtain the information.

Embodiments of the invention may be used to optimize multipleapplications on a data processing system. During the first period, thedata processor utilization of all applications on the data processingsystem is measured, and the applications with the highest data processorutilization are selected. For example, the ten applications with thehighest utilization are selected, or applications with a utilizationabove a threshold value are selected. Alternatively, a list of selectedapplications is provided.

After the analysis period, a data processor preference list is createdfor each selected application. The preference list is ordered based onthe performance benefit expected on each data processor. For example,where an application communicates a large amount with a NIC thatinterrupts a first data processor and a small amount with a NIC thatinterrupts a second data processor, then the first data processor mayappear top of the preference list, followed by the second dataprocessor, followed by, for example, a third data processor.

Each selected application is bound to the data processor at the top ofthe preference list for that application. If this is not possible, forexample if the utilization on the data processor was above a thresholdvalue during the first period, then the next data processor in thepreference list is chosen, and so on until the application is bound to adata processor.

Embodiments of the invention may periodically optimize one or moreapplications.

FIG. 3 shows an example of a data processing system 300 suitable forimplementing embodiments of the invention. The system 300 comprises afirst processor 302 and a second processor 304, although alternativesystems may include more than two processors. The system 300 includesmemory 306. The system 300 may also include a permanent storage device308, such as a hard disk, and/or a communications device 310 forcommunicating with a wired and/or wireless network, such as a LAN, WAN,internet or other network. The system 300 may also include a displaydevice and/or an input device, such as, for example, a mouse and/orkeyboard.

It will be appreciated that embodiments of the present invention can berealised in the form of hardware, software or a combination of hardwareand software. Any such software may be stored in the form of volatile ornon-volatile storage such as, for example, a storage device like a ROM,whether erasable or rewritable or not, or in the form of memory such as,for example, RAM, memory chips, device or integrated circuits or on anoptically or magnetically readable medium such as, for example, a CD,DVD, magnetic disk or magnetic tape. It will be appreciated that thestorage devices and storage media are embodiments of machine-readablestorage that are suitable for storing a program or programs that, whenexecuted, implement embodiments of the present invention. Accordingly,embodiments provide a program comprising code for implementing a systemor method as claimed in any preceding claim and a machine readablestorage storing such a program. Still further, embodiments of thepresent invention may be conveyed electronically via any medium such asa communication signal carried over a wired or wireless connection andembodiments suitably encompass the same.

All of the features disclosed in this specification (including anyaccompanying claims, abstract and drawings), and/or all of the steps ofany method or process so disclosed, may be combined in any combination,except combinations where at least some of such features and/or stepsare mutually exclusive.

Each feature disclosed in this specification (including any accompanyingclaims, abstract and drawings), may be replaced by alternative featuresserving the same, equivalent or similar purpose, unless expressly statedotherwise. Thus, unless expressly stated otherwise, each featuredisclosed is one example only of a generic series of equivalent orsimilar features.

The invention is not restricted to the details of any foregoingembodiments. The invention extends to any novel one, or any novelcombination, of the features disclosed in this specification (includingany accompanying claims, abstract and drawings), or to any novel one, orany novel combination, of the steps of any method or process sodisclosed. The claims should not be construed to cover merely theforegoing embodiments, but also any embodiments which fall within thescope of the claims.

1.-19. (canceled)
 20. A system comprising: a plurality of processors;and memory coupled to the plurality of processors, each processor of theplurality of processors being associated with a respective area ofmemory, the memory having instructions stored thereon that, whenexecuted by at least one processor of the plurality of processors, causethe at least one processor to: analyze an application for a first periodto obtain a first application activity analysis that includesinteractions of the application with areas of memory, identify a targetprocessor, of the plurality of processors, that is associated with anarea of memory used by the application, according to the interactions ofthe application with areas of memory included in the first applicationactivity analysis, and bind the application to the target processor. 21.The system of claim 20, wherein the target processor is associated withan area of memory having a greatest number of interactions with theapplication in the first period, according to the first applicationactivity analysis.
 22. The system of claim 20, wherein the firstapplication activity analysis includes utilization of each processorduring the first period.
 23. The system of claim 22, wherein theinstructions that cause the at least one processor to identify thetarget processor are to identify the target processor from amongprocessors with utilization, according to the first application activityanalysis, below a threshold value.
 24. The system of claim 20, furthercomprising instructions that, when executed by the at least oneprocessor, cause the at least one processor to: measure performance ofthe application for the first period to obtain a first performanceanalysis, measure performance of the application for a second periodafter binding of the application to the target processor, to obtain asecond performance analysis, compare the first performance analysis andthe second performance analysis, and undo the binding if comparison ofthe first performance analysis and the second performance analysisindicates that performance of the application has degraded after thebinding.
 25. The system of claim 24, wherein the comparison indicatesthat performance of the application has degraded if the firstperformance analysis has a higher cycles-per-instruction value than thesecond performance analysis.
 26. The system of claim 20, wherein theapplication is of a plurality of applications, and the instructions areto cause the at least one processor to: analyze, over the first period,the plurality of applications to obtain respective first applicationactivity analyses, the first application activity analyses includingprocessor utilization percentage for respective applications, choose, asselected applications, applications having greater processor utilizationpercentage than other applications of the plurality of applications,create processor preference lists for respective selected applications,where a processor preference list for a selected application list theprocessors in decreasing order according to numbers of interactionsbetween the processors and the selected application, and bind each ofthe selected applications to a highest preference processor of arespective processor preference list, the highest preference processorhaving a total utilization during the first period below a thresholdvalue.
 27. The system of claim 26, wherein the applications havinggreater processor utilization percentage are applications havingprocessor utilization percentage above a predetermined threshold or area predetermined number of applications having highest processorutilization percentages.
 28. A method comprising: analyzing anapplication for a first period to obtain a first application activityanalysis that includes interactions of the application with areas ofmemory, where different areas of memory are associated with a respectiveprocessor of a plurality of processors; identifying a target processor,from the plurality of processors, that is associated with an area ofmemory used by the application, according to the interactions of theapplication with areas of memory included in the first applicationactivity analysis; and binding the application to the target processor.29. The method of claim 28, wherein the target processor is associatedwith an area of memory having a greatest number of interactions with theapplication in the first period, according to the first applicationactivity analysis.
 30. The method of claim 28, wherein the firstapplication activity analysis includes utilization of each processorduring the first period, and the identifying the target processoridentifies the target processor from among processors with utilizationbelow a threshold value.
 31. The method of claim 28, further comprising:measuring performance of the application for the first period to obtaina first performance analysis, measuring performance of the applicationfor a second period after binding of the application to the targetprocessor, to obtain a second performance analysis; and undoing thebinding if the first performance analysis has a highercycles-per-instruction value than the second performance analysis. 32.The method of claim 28, further comprising: analyzing, over the firstperiod, a plurality of applications that includes the application, toobtain respective first application activity analyses, the firstapplication activity analyses including processor utilization percentagefor respective applications; choosing, as selected applications,applications having greater processor utilization percentage than otherapplications of the plurality of applications; creating processorpreference lists for respective selected applications, where a processorpreference list for a selected application list the processors indecreasing order according to numbers of interactions between theprocessors and the selected application; and binding each of theselected applications to a highest preference processor of a respectiveprocessor preference list, the highest preference processor having atotal utilization during the first period below a threshold value.
 33. Anon-transitory computer readable medium storing instructions executableby a processor of a system that has a plurality of processors andmemory, the non-transitory computer readable medium comprising:instructions to analyze an application for a first period to obtain afirst application activity analysis that includes interactions of theapplication with areas of memory, where different areas of memory areassociated with a respective processor of the plurality of processors;instructions to identify a target processor, from among the plurality ofprocessors, that is associated with an area of memory used by theapplication, according to the interactions of the application with areasof memory included in the first application activity analysis; andinstructions to bind the application to the target processor.
 34. Thenon-transitory computer readable medium of claim 33, wherein theinstructions to identify the target processor are to identify the targetprocessor associated with an area of memory having a greatest number ofinteractions with the application in the first period, according to thefirst application activity analysis.
 35. The non-transitory computerreadable medium of claim 33, wherein the first application activityanalysis includes utilization of each processor during the first period.36. The non-transitory computer readable medium of claim 35, wherein theinstructions to identify the target processor are to identify the targetprocessor from among processors with utilization, according to the firstapplication activity analysis, below a threshold value.
 37. Thenon-transitory computer readable medium of claim 33, further comprising:instructions to measure performance of the application for the firstperiod to obtain a first performance analysis, instructions to measureperformance of the application for a second period after binding of theapplication to the target processor, to obtain a second performanceanalysis; and instructions to undo the binding if the first performanceanalysis has a higher cycles-per-instruction value than the secondperformance analysis.
 38. The non-transitory computer readable medium ofclaim 33, further comprising: instructions to analyze, over the firstperiod, a plurality of applications that includes the application, toobtain respective first application activity analyses, the firstapplication activity analyses including processor utilization percentagefor respective applications; instructions to choose, as selectedapplications, applications having greater processor utilizationpercentage than other applications of the plurality of applications;instructions to create processor preference lists for respectiveselected applications, where a processor preference list for a selectedapplication list the processors in decreasing order according to numbersof interactions between the processors and the selected application; andinstructions to bind each of the selected applications to a highestpreference processor of a respective processor preference list, thehighest preference processor having a total utilization during the firstperiod below a threshold value.
 39. The non-transitory computer readablemedium of claim 38, wherein the applications having greater processorutilization percentage are applications having processor utilizationpercentage above a predetermined threshold or are a predetermined numberof applications having highest processor utilization percentages.